Overview

Dataset statistics

Number of variables11
Number of observations683
Missing cells0
Missing cells (%)0.0%
Duplicate rows8
Duplicate rows (%)1.2%
Total size in memory58.8 KiB
Average record size in memory88.2 B

Variable types

Numeric10
Categorical1

Warnings

Dataset has 8 (1.2%) duplicate rows Duplicates
UofCSize is highly correlated with UofCShapeHigh correlation
UofCShape is highly correlated with UofCSizeHigh correlation

Reproduction

Analysis started2021-04-10 16:48:44.135944
Analysis finished2021-04-10 16:48:56.602177
Duration12.47 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

id
Real number (ℝ≥0)

Distinct630
Distinct (%)92.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1076720.227
Minimum63375
Maximum13454352
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2021-04-10T22:18:56.681524image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum63375
5-th percentile413929.8
Q1877617
median1171795
Q31238705
95-th percentile1334001.2
Maximum13454352
Range13390977
Interquartile range (IQR)361088

Descriptive statistics

Standard deviation620644.0477
Coefficient of variation (CV)0.576420905
Kurtosis257.3684102
Mean1076720.227
Median Absolute Deviation (MAD)104296
Skewness13.74841025
Sum735399915
Variance3.851990339 × 1011
MonotocityNot monotonic
2021-04-10T22:18:56.810371image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11824046
 
0.9%
12760915
 
0.7%
11986413
 
0.4%
7040972
 
0.3%
13219422
 
0.3%
6950912
 
0.3%
10170232
 
0.3%
3851032
 
0.3%
10709352
 
0.3%
12406032
 
0.3%
Other values (620)655
95.9%
ValueCountFrequency (%)
633751
0.1%
763891
0.1%
957191
0.1%
1280591
0.1%
1429321
0.1%
ValueCountFrequency (%)
134543521
0.1%
82337041
0.1%
13719201
0.1%
13710261
0.1%
13698211
0.1%

Clump Thickness
Real number (ℝ≥0)

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.442166911
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2021-04-10T22:18:56.895943image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.820761319
Coefficient of variation (CV)0.6349966977
Kurtosis-0.6331245309
Mean4.442166911
Median Absolute Deviation (MAD)2
Skewness0.5876542361
Sum3034
Variance7.956694418
MonotocityNot monotonic
2021-04-10T22:18:56.963673image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1139
20.4%
5128
18.7%
3104
15.2%
479
11.6%
1069
10.1%
250
 
7.3%
844
 
6.4%
633
 
4.8%
723
 
3.4%
914
 
2.0%
ValueCountFrequency (%)
1139
20.4%
250
 
7.3%
3104
15.2%
479
11.6%
5128
18.7%
ValueCountFrequency (%)
1069
10.1%
914
 
2.0%
844
6.4%
723
 
3.4%
633
4.8%

UofCSize
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.150805271
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2021-04-10T22:18:57.031710image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.065144856
Coefficient of variation (CV)0.9728131675
Kurtosis0.0736791399
Mean3.150805271
Median Absolute Deviation (MAD)0
Skewness1.226404096
Sum2152
Variance9.395112987
MonotocityNot monotonic
2021-04-10T22:18:57.095095image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1373
54.6%
1067
 
9.8%
352
 
7.6%
245
 
6.6%
438
 
5.6%
530
 
4.4%
828
 
4.1%
625
 
3.7%
719
 
2.8%
96
 
0.9%
ValueCountFrequency (%)
1373
54.6%
245
 
6.6%
352
 
7.6%
438
 
5.6%
530
 
4.4%
ValueCountFrequency (%)
1067
9.8%
96
 
0.9%
828
4.1%
719
 
2.8%
625
 
3.7%

UofCShape
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.21522694
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2021-04-10T22:18:57.178864image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.988580818
Coefficient of variation (CV)0.929508515
Kurtosis-0.01681562061
Mean3.21522694
Median Absolute Deviation (MAD)0
Skewness1.157890012
Sum2196
Variance8.931615308
MonotocityNot monotonic
2021-04-10T22:18:57.232359image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1346
50.7%
1058
 
8.5%
258
 
8.5%
353
 
7.8%
443
 
6.3%
532
 
4.7%
730
 
4.4%
629
 
4.2%
827
 
4.0%
97
 
1.0%
ValueCountFrequency (%)
1346
50.7%
258
 
8.5%
353
 
7.8%
443
 
6.3%
532
 
4.7%
ValueCountFrequency (%)
1058
8.5%
97
 
1.0%
827
4.0%
730
4.4%
629
4.2%

Marginal Adhesion
Real number (ℝ≥0)

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.830161054
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2021-04-10T22:18:57.316972image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.86456219
Coefficient of variation (CV)1.012155187
Kurtosis0.9424072094
Mean2.830161054
Median Absolute Deviation (MAD)0
Skewness1.509181064
Sum1933
Variance8.205716543
MonotocityNot monotonic
2021-04-10T22:18:57.379522image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1393
57.5%
358
 
8.5%
258
 
8.5%
1055
 
8.1%
433
 
4.8%
825
 
3.7%
523
 
3.4%
621
 
3.1%
713
 
1.9%
94
 
0.6%
ValueCountFrequency (%)
1393
57.5%
258
 
8.5%
358
 
8.5%
433
 
4.8%
523
 
3.4%
ValueCountFrequency (%)
1055
8.1%
94
 
0.6%
825
3.7%
713
 
1.9%
621
 
3.1%

SECSize
Real number (ℝ≥0)

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.234260615
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2021-04-10T22:18:57.448524image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.223085456
Coefficient of variation (CV)0.6873550777
Kurtosis2.129639279
Mean3.234260615
Median Absolute Deviation (MAD)0
Skewness1.703716401
Sum2209
Variance4.942108947
MonotocityNot monotonic
2021-04-10T22:18:57.517483image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
2376
55.1%
371
 
10.4%
448
 
7.0%
144
 
6.4%
640
 
5.9%
539
 
5.7%
1031
 
4.5%
821
 
3.1%
711
 
1.6%
92
 
0.3%
ValueCountFrequency (%)
144
 
6.4%
2376
55.1%
371
 
10.4%
448
 
7.0%
539
 
5.7%
ValueCountFrequency (%)
1031
4.5%
92
 
0.3%
821
3.1%
711
 
1.6%
640
5.9%

Bare Nuclei
Real number (ℝ≥0)

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.54465593
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2021-04-10T22:18:57.595643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.64385716
Coefficient of variation (CV)1.027986138
Kurtosis-0.7988441354
Mean3.54465593
Median Absolute Deviation (MAD)0
Skewness0.9900156547
Sum2421
Variance13.27769501
MonotocityNot monotonic
2021-04-10T22:18:57.649115image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1402
58.9%
10132
 
19.3%
530
 
4.4%
230
 
4.4%
328
 
4.1%
821
 
3.1%
419
 
2.8%
99
 
1.3%
78
 
1.2%
64
 
0.6%
ValueCountFrequency (%)
1402
58.9%
230
 
4.4%
328
 
4.1%
419
 
2.8%
530
 
4.4%
ValueCountFrequency (%)
10132
19.3%
99
 
1.3%
821
 
3.1%
78
 
1.2%
64
 
0.6%

Bland Chromatin
Real number (ℝ≥0)

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.445095168
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2021-04-10T22:18:57.733744image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.449696573
Coefficient of variation (CV)0.7110678959
Kurtosis0.1676456428
Mean3.445095168
Median Absolute Deviation (MAD)1
Skewness1.095270469
Sum2353
Variance6.001013297
MonotocityNot monotonic
2021-04-10T22:18:57.796238image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
3161
23.6%
2160
23.4%
1150
22.0%
771
10.4%
439
 
5.7%
534
 
5.0%
828
 
4.1%
1020
 
2.9%
911
 
1.6%
69
 
1.3%
ValueCountFrequency (%)
1150
22.0%
2160
23.4%
3161
23.6%
439
 
5.7%
534
 
5.0%
ValueCountFrequency (%)
1020
 
2.9%
911
 
1.6%
828
 
4.1%
771
10.4%
69
 
1.3%

Normal Nucleoli
Real number (ℝ≥0)

Distinct10
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.869692533
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2021-04-10T22:18:57.876113image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.052666407
Coefficient of variation (CV)1.063760794
Kurtosis0.4735882982
Mean2.869692533
Median Absolute Deviation (MAD)0
Skewness1.420431124
Sum1960
Variance9.318772193
MonotocityNot monotonic
2021-04-10T22:18:57.937394image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
1432
63.3%
1060
 
8.8%
342
 
6.1%
236
 
5.3%
823
 
3.4%
622
 
3.2%
519
 
2.8%
418
 
2.6%
716
 
2.3%
915
 
2.2%
ValueCountFrequency (%)
1432
63.3%
236
 
5.3%
342
 
6.1%
418
 
2.6%
519
 
2.8%
ValueCountFrequency (%)
1060
8.8%
915
 
2.2%
823
 
3.4%
716
 
2.3%
622
 
3.2%

Mitoses
Real number (ℝ≥0)

Distinct9
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.603221083
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2021-04-10T22:18:58.003588image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.732674146
Coefficient of variation (CV)1.080745609
Kurtosis12.27337364
Mean1.603221083
Median Absolute Deviation (MAD)0
Skewness3.511476241
Sum1095
Variance3.002159697
MonotocityNot monotonic
2021-04-10T22:18:58.070781image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%)
1563
82.4%
235
 
5.1%
333
 
4.8%
1014
 
2.0%
412
 
1.8%
79
 
1.3%
88
 
1.2%
56
 
0.9%
63
 
0.4%
ValueCountFrequency (%)
1563
82.4%
235
 
5.1%
333
 
4.8%
412
 
1.8%
56
 
0.9%
ValueCountFrequency (%)
1014
2.0%
88
1.2%
79
1.3%
63
 
0.4%
56
0.9%

Class
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
2
444 
4
239 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters683
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2
2nd row2
3rd row2
4th row2
5th row2
ValueCountFrequency (%)
2444
65.0%
4239
35.0%
2021-04-10T22:18:58.218598image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-04-10T22:18:58.271163image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Most occurring characters

ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number683
100.0%

Most frequent character per category

ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Most occurring scripts

ValueCountFrequency (%)
Common683
100.0%

Most frequent character per script

ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII683
100.0%

Most frequent character per block

ValueCountFrequency (%)
2444
65.0%
4239
35.0%

Interactions

2021-04-10T22:18:48.164505image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:48.294574image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:48.391831image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:48.592093image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:48.692050image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:48.813438image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:48.929634image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.026678image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.123149image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.214152image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.305155image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.385156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.469122image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.550359image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.632620image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.717663image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.801791image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.876138image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:49.958201image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.050342image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.137949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.212795image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.294482image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.376432image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.456293image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.539722image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.619459image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.702699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.795184image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.874265image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:50.957703image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.039056image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.120545image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.202601image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.284930image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.367220image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.449398image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.541877image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.622165image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.703910image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.797353image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.879433image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:51.960434image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:52.142025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:52.223031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:52.307016image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:52.396021image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:52.476025image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:52.555030image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:52.656031image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:52.742983image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:52.830469image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:52.921147image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.010605image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.094975image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.195537image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.276713image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.358774image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.438765image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.521804image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.603131image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.685087image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.765128image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.838992image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:53.934983image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.015171image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.100123image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.180172image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.261173image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.341174image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.421172image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.501160image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.581509image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.671474image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.765834image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.844826image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:54.922171image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.004180image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.085699image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.168148image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.255897image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.349790image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.443344image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.525322image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.605416image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.679457image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.761947image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.851638image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:55.935848image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-10T22:18:56.018891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-04-10T22:18:58.327174image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-10T22:18:58.488555image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-10T22:18:58.638591image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-10T22:18:58.791344image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2021-04-10T22:18:56.183729image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-10T22:18:56.505268image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

idClump ThicknessUofCSizeUofCShapeMarginal AdhesionSECSizeBare NucleiBland ChromatinNormal NucleoliMitosesClass
010000255111213112
1100294554457103212
210154253111223112
310162776881343712
410170234113213112
510171228101087109714
6101809911112103112
710185612121213112
810330782111211152
910330784211212112

Last rows

idClump ThicknessUofCSizeUofCShapeMarginal AdhesionSECSizeBare NucleiBland ChromatinNormal NucleoliMitosesClass
6736545461111211182
6746545461113211112
675695091510105454414
6767140393111211112
6777632353111212122
6787767153111321112
6798417692111211112
6808888205101037381024
68189747148643410614
68289747148854510414

Duplicate rows

Most frequent

idClump ThicknessUofCSizeUofCShapeMarginal AdhesionSECSizeBare NucleiBland ChromatinNormal NucleoliMitosesClasscount
0320675335231071142
146690611112111122
270409711111121122
3110052461010281073342
4111611691010110833142
5119864131112131122
6121886011111131122
7132194251112131122